Experiments on authorship attribution by intertextual distance in English
نویسنده
چکیده
How can it be said that texts are "near" or "distant" from one another? Are different texts by a single author more similar than texts by different authors? To answer these questions, a method is proposed by combination of the calculus of intertextual distance with automatic clustering and tree-classification. A blind test and some additional experiments show that this method offers an interesting tool for non-traditional authorship attribution.
منابع مشابه
A Tool for Literary Studies: Intertextual Distance and Tree Classification
How to measure proximities and oppositions in large text corpora? Intertextual distance provides a simple and interesting solution. Its properties make it a good tool for text classification, and especially for tree-analysis which is fully presented and discussed here. In order to measure the quality of this classification, two indices are proposed. The method presented provides an accurate too...
متن کاملAbout labbe's "intertextual distance"
In the 2001, Volume 8, Number 3, issue of the Journal of Quantitative Linguistics (pp. 213 – 231) M. M. Dominique and Cyril Labbé published a paper entitled “Inter-Textual Distance and Authorship Attribution. Corneille and Molière”. Dominique and Cyril Labbé (hereafter referred to as DCL) propose a new formula for the computation of dissimilarity between texts, as well as a distances scale. The...
متن کاملAuthorship Attribution: A Comparative Study of Three Text Corpora and Three Languages
The first objective of this paper is carry out three experiments intended to evaluate authorship attribution methods based on three test-collections available in three different languages (English, French, and German). In the first we represent and categorize 52 text excerpts written by nine authors and taken from 19th century English novels. In the second we work with 44 segments from French n...
متن کاملWho Wrote this Novel? Authorship Attribution across Three Languages
Based on different writing style definitions, various authorship attribution schemes have been proposed to identify the real author of a given text or text excerpt. In this article we analyze the relative performance of word types or lemmas assigned to represent styles and texts. As a second objective we compare two authorship attribution approaches, one based on principal component analysis (P...
متن کاملN-gram-based Author Profiles for Authorship Attribution
We present a novel method for computer-assisted authorship attribution based on characterlevel n-gram author profiles, which is motivated by an almost-forgotten, pioneering method in 1976. The existing approaches to automated authorship attribution implicitly build author profiles as vectors of feature weights, as language models, or similar. Our approach is based on byte-level n-grams, it is l...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Journal of Quantitative Linguistics
دوره 14 شماره
صفحات -
تاریخ انتشار 2007